ACL - COLING 1998 , Montreal , Canada , 491 - 497 , 1998 Improving Data

نویسنده

Hans van Halteren

چکیده

In this paper we examine how the di erences in modelling between di erent data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi vidual system We do this by means of an ex periment involving the task of morpho syntactic wordclass tagging Four well known tagger gen erators Hidden Markov Model Memory Based Transformation Rules and Maximum Entropy are trained on the same corpus data Af ter comparison their outputs are combined us ing several voting strategies and second stage classi ers All combination taggers outperform their best component with the best combina tion showing a lower error rate than the best individual tagger

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACL - COLING 1998 , Montreal , Canada , 491 - 497 , 1998 Improving Data Driven

In this paper we examine how the diierences in modelling between diierent data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum...

متن کامل

Some Properties of Preposition and Subordinate Conjunction Attachments

In the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’98), pages 1436-1442, Montréal, Canada. c ©1998 Université de Montréal. Determining the attachments of prepositions and subordinate conjunctions is a key problem in parsing natural language. This paper presents a trainable approach to making thes...

متن کامل

Efficient Linear Logic Meaning Assembly

متن کامل

Improving Automatic Indexing through Concept Combination and Term Enrichment

Although indexes may overlap, the output of an automatic indexer is generally presented as a flat and unstructured list of terms. Our purpose is to exploit term overlap and embedding so as to yield a substantial qualitative and quantitative improvement in automatic indexing through concept combination. The increase in the volume of indexing is 10.5% for free indexing and 52.3% for controlled in...

متن کامل